A novel lineage of the Capra genus discovered in the Taurus Mountains of Turkey using ancient genomics

Direkli Cave, located in the Taurus Mountains of southern Turkey, was occupied by Late Epipaleolithic hunters-gatherers for the seasonal hunting and processing of game including large numbers of wild goats. We report genomic data from new and published Capra specimens from Direkli Cave and, supplemented with historic genomes from multiple Capra species, find a novel lineage best represented by a ~14,000 year old 2.59 X genome sequenced from specimen Direkli4. This newly discovered Capra lineage is a sister clade to the Caucasian tur species (Capra cylindricornis and Capra caucasica), both now limited to the Caucasus region. We identify genomic regions introgressed in domestic goats with high affinity to Direkli4, and find that West Eurasian domestic goats in the past, but not those today, appear enriched for Direkli4-specific alleles at a genome-wide level. This forgotten ‘Taurasian tur’ likely survived Late Pleistocene climatic change in a Taurus Mountain refuge and its genomic fate is unknown.


Introduction
The genus Capra includes the domestic goat (Capra hircus) as well as a variety of wild mountaindwelling goat/ibex species distributed across Eurasia and North Africa including several listed as endangered or vulnerable (Pidancier et al., 2006;Shackleton, 1997). Nine species are currently recognized by the IUCN; however, taxonomic relationships are still under revision (Pidancier et al., 2006;Zheng et al., 2020). Among these, the status of the two species endemic to the Caucasus Mountains has been debated (Groves and Grubb, 2011;Parrini et al., 2009). The East Caucasian tur (Capra cylindricornis) has been considered either a species distinct from the West Caucasian tur (Capra caucasica) or they comprise a single species of two potentially-hybridizing populations (Heptner et al., 1961). Moreover the bezoar (Capra aegagrus), progenitor of domestic goat, has also been reported to hybridize with both tur varieties with which it shares seasonal grazing territories in the Caucasus region (Pfitzenmayer, 1915;Sarkisov, 1953;Weinberg, 2002). Interspecies Capra gene flow is well known (Kazanskaia et al., 2007;Manceau et al., 1999;Pidancier et al., 2006),       ,000 year old goat from the Zagros Mountains. A Late Pleistocene wild goat from the Armenian Lesser Caucasus, Hovk1, shows highest affinity with Direkli4 ( Figure 2-figure supplements 4 and 5). Bezoar goats from Direkli Cave also show high Direkli4 allele sharing, mirroring affinity measures with west Caucasian tur (Zheng et al., 2020). While directionality is uncertain, these statistics imply gene flow between the tur-like lineage and wild bezoar.
Examining domestic goats we find that Neolithic genomes from Europe show greater affinity to Direkli4 (Figure 2-figure supplement 4), but Neolithic Iranian goats do not, echoing the distribution of Direkli bezoar-related ancestry in West Eurasian populations (Daly et al., 2018). We account for possible gene flow from Caucasian tur into modern European goats using the statistic D(Tur1, Direkli4; X, Sheep) to compare relative affinity with Tur1 and Direkli4 (Supplementary file 1I). With the exception of two other tur samples, all examined domestic/bezoar goats show either a bias towards Direkli4 or gave a non-significant result, consistent with Direkli4-related admixture or a more complicated genetic history.
Genetic exchange between bezoar and the ancestors of Direkli4 could confound these measures of shared variation among domestic populations. We identified variants specific to Direkli4, conditioned on ancestral allele fixation in a range of defined groups ( Figure 3A, see Materials and methods). Using this we calculate a statistic analogous to the D statistic, here termed the extended D or D ex . D ex measures the relative degree of allele sharing, derived specifically in a selected genome or group of genomes, and may have some utility in genera with complex admixture histories or admixture from ghost lineages. Relative to Neolithic Zagros goats, ancient domestic genomes from western Eurasia have an excess of Direkli4-specific variants ( Figure 3B, Figure 3-figure supplement 1, Supplementary file 1J). This 'Direkli4-specific' allele sharing signal is absent in ancient goats from Iran-eastwards, and in all tested modern goat ( Figure 3C). To control for possible reference biases, we calculated D ex ascertaining on variants segregating in sheep (Supplementary file 1K) and recovered similar results (Pearson's r=0.9935). Repeating the analysis using other ancient/historic Capra 'specific' alleles shows somewhat correlated results (Supplementary file 1L), but the distinct patterns of allele sharing (Figure 3-figure supplements 2 and 3, Supplementary file 1M) imply that Direkli4 ancestry in domestic goat varies temporally and geographically.
We next identify alleles derived in Direkli4 also at a low frequency (>0%,≤10%) in other Capra and bezoar, and then measure their abundance in domestic goats. The west Caucasian tur (Tur1) most frequently shares derived alleles with Direkli4 and domestic goat ( Figure 3-figure supplement 4), consistent with their cladal relationship (although this measure is sensitive to genome depth, see Methods). Ancient European domestic goats share a higher proportion of alleles with both Direkli4 and the high-coverage bezoar from Direkli Cave, Direkli1-2. In comparison, Modern European and African goats carry variation present in Direkli4 plus one of the two Caucasian tur (Tur1 and Caucasus1). This discrepancy could be explained by either gene flow from domestic goats into tur during the last 8000 years, or alternatively an increase in tur-Direkli4 related ancestry in European populations over time.
Investigating gene flow events within Capra, automated tree-based model exploration (Pickrell and Pritchard, 2012) detects admixture between the Direkli4/Tur lineage and the ancestors of the Late Pleistocene bezoar Hovk-1 (Figure 3-figure supplement 5). Residuals of this graph point to unmodeled affinity between Direkli4 and both Direkli Cave bezoar and with Neolithic Serbian domestic goat (Figure 3-figure supplement 6). Modern European goat do not show unmodeled Direkli4 affinity, supporting the interpretation that Direkli4-related ancestry has declined with time in west Eurasian goats. A reduced set of populations explored using ML network orientation (Molloy et al., 2021) reiterates the Tur1/Direkli4 and Direkli bezoar lineages admixture, and also between Direkli bezoar and domestic goat (Figure 3-figure supplement 7). Investigating admixture graph space (see Materials and methods, Maier et al., 2022) we find 2 admixture events best explain how a     can be modeled. A majority (6/11) of graphs model Direkli bezoar as containing ancestry related to Direkli4 (median 1.5%, mean 5.2%; best fitting graph is shown in Figure 3-figure supplement 8), with a single graph modeling the opposite (2% Direkli bezoar ancestry). While the graph space explored is limited, these results suggest a greater degree of 'Direkli4 to Direkli bezoar' gene flow than 'Direkli bezoar to Direkli4'.
We finally identify 3 out of 112 regions introgressed from other Capra species to domestic goat (Zheng et al., 2020) which show high affinity with Direkli4 ( Figure 3-figure supplements 9 -11, 21-22). A further nine regions appear to have most affinity with the Direkli4-tur clade (Figure 3figure supplements 12-20), including a locus encompassing MUC6, a target of selection in domestic goats during the last 10,000 years (Zheng et al., 2020), implicating the Direkli4 lineage in the makeup of domestic goat gene pool.

Discussion
Our results indicate that a lineage related to the Caucasian tur existed in the Taurus Mountains during the Late Pleistocene, as late as the 12th millennium cal BCE. Based on the current, limited genomic data from the Capra genus, which we improve on here, this lineage appears to be a sister group to the tur C. caucasica and C. cylindricornis. Similar to other mammalian groups (Gopalakrishnan et al., 2018;Palkopoulou et al., 2018;Zheng et al., 2020), admixture likely occurred among Capra lineages; the population reported here carries bezoar-associated mtDNA and a possible small amount of bezoar nuclear genome ancestry (2% from 1/12 graphs). This Taurasian tur population is itself a possible candidate for the source of Tur-like ancestry present in domestic goats, including an introgressed MUC6 allele fixed in modern populations which increases gastrointestinal parasite resistance (Zheng et al., 2020). Given the relative paucity of Capra genomic data available compared to other mammalian groups, additional genomes from the genus will help refine the history of divergences and gene flow events which shaped the group's evolution.
We suggest this novel 'Taurasian tur' lineage be designated Capra taurensis following IUCN convention (Weinberg and Lortkipanidze, 2020) or Capra caucasica taurensis under a subspecies classification (De Queiroz, 2007;Wilson and Reeder, 2005). The Taurasian tur may have diverged from the Caucasian lineages 130-200kya based on mtDNA coalescent estimates (Bouckaert et al., 2014;Daly et al., 2018). The current distribution of Capra species is mostly discontinuous and is suggestive of climate-induced fragmentation (Shackleton, 1997). The ancestors of Caucasian tur likely extended over a broader range in Eurasia during the Late Pleistocene but may have been poorly captured by the fossil record (Crégut-Bonnoure, 1991;Uerpmann, 1987;Weinberg, 2002). The large variability and high upper size range of Capra remains are consistent with both smaller-bodied bezoar and largerbodied tur-relatives being present within the faunal assemblage at Direkli as well as other sites in the                     (Gavashelishvili et al., 2018) while experiencing a severe matrilineal bottleneck ( Figure 2B). C. taurensis appears to have produced fertile offspring with other members of the Capra genus; the traces of shared ancestry in ancient bezoar and likely managed goat ( Figure 3B) may be the consequence of direct gene flow or secondarily via admixed bezoar. Gene flow between early managed goats and Anatolian bezoar carrying C. taurensis ancestry could partially explain the divergence between Zagros and more westerly herds.
Given the tremendous pressure on Capra species via Anthropocene over-hunting and habitat disruption (Shackleton, 1997), it is assumed that C. taurensis is extinct, with its existence only now revealed via palaeogenomics. The Caucasian tur's preference for snowier habitats (Gavashelishvili et al., 2018) combined with the lower altitude of the Taurus Mountains relative to the Caucasus ( Figure 1A) may have rendered the lineage vulnerable to climatic change via Holocene warming and interspecific competition with bezoar, which are still found in the Taurus mountains (Gavashelishvili, 2009;Naderi et al., 2008), leading to its hypothesised extinction. As the history of C. taurensis following the Late Pleistocene is still unknown, further genomic surveys of Holocene Capra remains and present-day populations, such as the VarGoats project (Denoyelle et al., 2021), from this and adjacent regions may illuminate its genetic legacy.

Materials and methods
Materials and methods summary DNA from 7 postcranial bone elements from Direkli Cave and 13 historic Capra specimen was extracted via standard aDNA protocols (Yang et al., 1998), with a 0.5% sodium hypochlorite pre-wash (Korlević et al., 2015) performed for the Direkli material. Following uracil excision (Rohland et al., 2015) and dsDNA library construction , libraries were subject to shotgun sequencing (Illumina HiSeq 2000 and NovaSeq 6000) or RNA-bait enrichment of mtDNA reads prior to shotgun sequencing. Additional sequencing data was also generated for specimen Direkli4.

Direkli Cave
Direkli Cave (37°51'28.08"N, 36°39'13.98"E) is located in the Central Taurus mountains, north-west of Kahramanmaraş Province, near the village of Döngel, Turkey (Erek, 2010;Erek, 2012). It was discovered and first excavated by Kökten, 1960. The cave sits in a south-facing limestone escarpment at an elevation of 1100 m above sea level. It is located at the base of the Deli Höbek mountain and adjacent peaks which rise up to 2200 m immediately to the northeast. The cave is located above a small alluvial fan on the eastern side of the north-south trending valley draining the Tekir river (and          the D825 highway), itself a tributary to the Ceyhan, a major river system which flows into the Mediterranean. The site is located on the edge of a highly dissected upland and within a corridor providing easy access both to the Mediterranean coast as well as the interior of southern Turkey. It is therefore situated along a convenient route for seasonal movements between higher and lower elevations for both humans and other animal species.
In 2007, new excavations under the direction of C. M.
Erek were initiated with the support of the Turkish Ministry of Culture and Tourism in order to further explore the late Pleistocene deposits in the cave, a period that is poorly documented for the central Taurus mountains. Work carried out since that time has identified a stratigraphic sequence rich in artifacts and features dating to the late Pleistocene with lithic parallels both with the Levantine Natufian as well as sites on the Turkish Mediterranean coast. Radiocarbon dates derived from charcoal and animal bone from layers containing microlithic tools, date the main prehistoric occupation layers of the site to between 12,100-8900 years calibrated BCE. The majority of these dates place the occupation in the late Pleistocene prior to the Younger Dryas although the cave was used as late as the late 10th millennium BCE as well.
Excavations were carried out following the natural and cultural stratigraphic layers. Sediments of different colors in each plan square were collected in different buckets and passed through a triple screening system. The sediments, which were all subjected to water screening, were sieved through fine (1 mm), medium (2 mm), and coarse (4 mm) sieves and the materials in each sieve were dried in the shade. These recovery techniques have produced a rich faunal assemblage indicative of the local fauna exploited by the cave's occupants. This faunal community is dominated by the remains of wild goats (Capra spp-the focus of this study) but also deer, boar and black bear as well as a variety of fur bearing carnivores.
Capra specimens sequenced in this study derive from excavation areas located in the North portion (interior) of the cave (Figure 1-figure supplement 1). Stratigraphic affiliation of each specimen is reported in Table 1. Specimens were recovered from stratigraphic layers 4-8 and all derive from deposits containing material culture (especially microliths) consistent with a late Pleistocene date. In general, Capra specimens are highly fragmented and many exhibit cut and percussion marks clearly indicating an anthropogenic origin for the assemblage. There is no evidence that Capra remains accumulated through natural processes or carnivore denning behaviours. Direct AMS dates were acquired for two specimens (Direkli4 [Beta-432464: 12130+/-40] and Direkli2 [Beta-425280: 11370+/-40]) confirming their late Pleistocene provenience (12,200 cal BCE). Four specimens derive from grid square B/13 (Figure 1-figure supplement 2) including stratigraphic levels 4 A, 4 C, 7B, 7 C providing a temporal sequence from youngest to oldest including Direkli2, Direkli15, Direkli16, Direkli1.

Sample preparation
Material from the Muséum national d'Histoire naturelle collections in Paris were sampled on-site. Newly screened specimens from Direkli Cave and Dariali-Tamara Fort were sampled in dedicated ancient DNA facilities in TCD, Dublin, following standard protocols (Pääbo et al., 2004).
Sampled materials were cleaned with a drill bit and then subject to UV for 30 min, flipping midway to decontaminate both sides, followed by pulverization using a Mixer Mill (MM 200, Retsch).

DNA extraction and library preparation
For MNHN and Tamara Fort material,~150 mg of bone powder was subject to EDTA-prewash and proteinase K 3 day extraction as previously described (Daly et al., 2018;Gamba et al., 2014;MacHugh et al., 2000;Yang et al., 1998). For newly screened Direkli Cave material, samples were subjected to a 0.5% hypochlorite wash prior to EDTA wash and proteinase K-based extraction (Boessenkool et al., 2017;Daly et al., 2021). Previously described protocols for DNA cleanup (Daly et al., 2018;Daly et al., 2021), Uracil-DNA-glycosylase treatment Rohland et al., 2015 of 16.25 µl DNA followed by dsDNA library construction . Control tubes were included for extraction and library steps and kept for subsequent analysis, but did not show evidence of contamination.
Samples selected for deeper sequencing (typically >10% endogenous DNA, and including additional sequencing of Direkli4 originally reported in Daly et al., 2018) were sequenced on either a Illumina HiSeq 2000 (100 bp SE; Macrogen, Seoul) or NovaSeq 6000 (100bp PE; TrinSeq, Dublin). Deeper sequenced data was aligned as above, with a subsequent indel realignment step (The Broad Institute, 2018) and softclipping of reads by reducing the base quality of the first and last four bases to zero using a custom python script. Genome depth was estimated using GATK and presented along with alignment statistics in Supplementary file 1D.
For alignment to the sheep reference, the above pipeline was used substituting ARS1 with the sheep reference Oar_rambouillet_v1.0 available at https://www.ncbi.nlm.nih.gov/assembly/GCF_ 002742125.1/. Samples aligned to sheep are indicated in Supplementary file 1C.

Mitochondrial capture
Samples not selected for deep genomic sequencing were enriched for mammalian mtDNA sequences using an in-solution RNA hybridization approach using custom baits as previously described (Daly et al., 2018;Gnirke et al., 2009;Maricic et al., 2010;O'Sullivan et al., 2016) Daicel Arbor Biosciences, Ann Arbor, USA. Libraries enriched for mtDNA were subsequently sequenced using a MiSeq platform (50 bp SE; TrinSeq, Dublin).

Mitochondrial genome alignment and phylogeny
Reads were aligned to circularized (15 bp either end) mitochondrial reference genomes as previously described (Daly et al., 2021), realigning to closer mtDNA references to maximize sequence recovery. Sample fasta files were producing using ANGSD v0.922 (-doFasta2 -setMinDepth 3 -minQ 20 -minMapQ 30 -trim 4; Korneliussen et al., 2014) and decircularized by removing 15 bp at both ends of the resulting fasta sequence. Due to the low coverage of Direkli17, this sample fasta was produced with -setMinDepth 1 and analyzed independently of others.
A multiple sequence alignment of data generated here plus published Capra mtDNA sequences (Supplementary file 1E) was generated using MUSCLE (Edgar, 2004). A ML phylogeny with 100 bootstrap replicates was then computed using phyML (Guindon et al., 2010) using a BIC-selected substitution model, and visualized using Figtree (Rambaut, 2009).
Pairwise differences among aligned mtDNA sequences was calculated using a custom Python script, ignoring sites where one of the two samples were missing data ('N'). For F lineage comparisons, Direkli17 was excluded due to its low coverage and the higher error rate implicit in the -setMin-Depth 1 option used for that sample.

Modern alignment
Modern genomes (Supplementary file 1C) were aligned to ARS1 or Oar_rambouillet_v1.0 using bwa mem v0.7.5a-r405 (Li, 2013;Li et al., 2009;Li et al., 2008) and following GATK Best Practices, removing duplicates and reads with mapQ ≥30. Among these were 3 modern Tur genomes, which were included in the IBS calculation and nj tree (below) with permission from the VarGoats consortia (Denoyelle et al., 2021) prior to these samples' in-depth analyses. Samples aligned to sheep are indicated in Supplementary file 1C.

Error estimation
To estimate the error rates of the Direkli specimen and Capra genomes reported here, the ANGSD 'perfect genome' approach was employed, using a high coverage Old Irish Goat ('IOG',~42 X): -doAncError 1 -ref IOG. fa, where IOG. fa was generated using ANGSD -doFasta 1 -doCounts 1 -setMinDepth 13 -setMaxDepth 76 C 50 -minQ 20 -minMapQ 30. Sheep was used as the outgroup. Error rates were reasonable, with all falling below 0.5% and the highest reaching 0.3428%; the highest Direkli genome error was just 0.1947%. Error rates are shown in Supplementary file 1A and C (for previously published Direkli bezoar and the Tur1 specimen).
We additionally assessed error rates of ancient vs modern individuals by plotting distance from the outgroup as calculated from Identity-By-State statistics (see below). We expect high error individuals, particularly ancient ones, to show inflated distances from the outgroup (sheep). Plotting distances (Figure 1-figure supplement 7), the majority ancient Direkli specimens do not show excessive distance to the outgroup, with the highest distances being observed in historic and modern European (alpine and Iberian) ibex. A single Direkli wild goat / Capra aegagrus shows elevated distance to the outgroup relative to modern Capra aegagrus (0.240308), but all other Direkli specimens have unremarkable distance values.

D statistics
D statistics were computed using -doAbbaBaba 1 -rmTrans 1 -doCounts 1 and using sheep as above to define the ancestral allele. Correlation between D statistical tests was measured by calculating Pearson's correlation coefficient r using the cor() function of R Development Core Team, 2021. D test results are presented in Figure 2-figure supplements 4 and 5, and Supplementary file 1GH and I. A subset of D statistic tests were also repeated using sheep aligned-data (see above); Pearson's r for the test indicated in Supplementary file 1FG and H were 0.927, 0.950, and 0.494, respectively. The latter is not concerning as the D statistics in question (Direkli4, Tur1; X, Sheep) are overwhelmingly large for goat and sheep-aligned data (i.e. |D|>>0.05) and are consistent in directionality.
To calculate node support, pseudo-bootstrap datasets were generated by sampling-withoutreplacement 50 5 Mb regions a total of 100 times, and calculating IBS values for each of the 100 pseudoreplicates. Neighbour-joining trees were calculated as above and node supports applied to the base tree using RAxML (Stamatakis, 2014), and are presented in Figure 1B.
To control for possible reference genome effects, an additional IBS neighbour-joining tree was computed using sheep-aligned data using the following ANGSD settings, allowing for 90% site missingness due to the smaller number of genomes used: -doMajorMinor 4 GL 2 -minFreq 0.05 -minInd 41. Node support values were calculated as above, using 50x5 Mb regions per replicate. This tree is presented in Figure 1-figure supplement 5, with (A) and without (B) the lower coverage Direkli16 sample which falls within the 'taurasian tur' clade. Site counts for both trees are 96,930 and 110,672 respectively.
We additionally constructed a MDS plot using the sheep-aligned MDS data and R functions cmdscale( as. dist()) on the IBS.ibsMat file (Figure 1-figure supplement 6A), with an insert of the corner of the plot containing the taurensis tur lineage. The MDS places the nubian, alpine, and european ibex specimens, as well as tur, in a distinct part of the MDS plot away from a Capra aegagrus/ hircus group and a Capra sibirica/falconeri (Siberian ibex and markhor). Within the west Eurasian ibex & tur genomes, the relative affinities seen in the nj tree ( Figure 1B) are reiterated, with the tur group being closer to the European ibex than to Nubian ibex. The taurasian tur genomes Direkli4 and Direkli16 clearly fall within the Tur group. Both eastern and western tur genomes form distinct groups in MDS space, with Direkli tur showing somewhat higher affinity to the eastern tur group. We additionally computed IBS using sheep-aligned, ancients & historic samples only, allowing 20% missingness to account for the relatively high number of low coverage samples . From the resulting IBS matrix (computed from 464,799 biallelic transversion sites), an MDS plot was produced (Figure 1-figure supplement 6B). This plot offers less discriminatory power within the 'ibex / tur' group, as expected given the lower number of Capra genomes and overall lower coverage among the non Capra hircus / Capra aegagrus genomes. However, both Direkli 'taurasian tur' genomes fall clearly within the 'ibex / tur' group.

Haploid calling
To produce a haploid (random read sampling) callset of genomes with at least 0.5 X coverage for subsequent analyses the ANGSD -doHaplocall function was employed: -doHaploCall 1 -minInd 2 -minQ 20 -minMapQ 30 -remove_bads 1 -uniqueOnly -doCounts 1 C 50 -trim 4. Output files were then screened using a custom python to retain biallelic sites only and remove CpG sites (232,685,198 remaining), and then converted to plink format (Purcell et al., 2007) using the haploToPlink tool provided with ANGSD. An additional dataset consisting of genomes ≥2 X coverage was also generated for Treemix and Orientagraph analyses (below).

Outgroup/sheep ascertained sites
To create an outgroup-ascertained set of variant sites for Treemix/Orientagraph, Minor Allele Frequencies were computed dataset of 11 ARS1-aligned sheep (Supplementary file 1C).

Introgressed regions
To examine possible sources of 112 haplotypes identified as introgressed in domestic goat (Zheng et al., 2020), IBS values were calculated using ANGSD (-doIBS 1 -doMajorMinor 5 -minInd 157 -minMaf 0.02). The -minInd parameter was set to ensure only sites with ≤10% missing data across the 175 individuals were considered, and -minMaf so that alleles found at least one Capra source and two domestic genomes were included. Genomes with ≥1 X mean coverage were included, plus the Capra samples of interest. Introgressed region coordinates were obtained from Data Supplementary file 1 of Zheng et al., 2020. Heatmaps from pairwise ibs data were constructed using the R gplots heatmap.2() function (Warnes et al., 2019), first removing Nubiana1 and Caucasus1 due to coverage effects, and sheep samples due to distortion of ibs distance value scales. Heatmaps and hierarchical clustering of each putatively-introgressed region are displayed in Figure 3-figure supplements 9-22. Additionally, nj trees were constructed for each IBS matrix using the nj() function and visually assessed to determine if the Direkli4 lineage is a possible source of introgressed haplogroup. All 10 nj trees are shown in Figure 3-figure supplements 21 and 22. Assessing nj trees and heatmaps, 3 regions out of the 112 total are plausible as Direkli4 being the closest-to-domestic sample (1: 104,150,173-104,349,720; 2:24,324,410-24,369,675; 13:66,710,508-66,749,824); a further 7 may fit with the Direkli4-Tur1 clade being closest to domestic. These assignments should be considered preliminary in lieu of higher quality genomic data for the two Caucasian tur and the Direkli4 lineages.

Identity of MNHN sample Falconeri1
Falconeri1, thought to be a Capra falconeri hepteneri (Tadjik or Bukharan markhor) from the Muséum national d'Histoire naturelle (MNHN-ZM-AC-2009-243), shows the genetic profile aberrant relative to its supposed species. The mitochondrial sequence of Falconeri1 groups with the Barbary sheep or aoudad (Ammotragus lervia), a member of the Caprini tribe but not of the genus Capra (Figure 2figure supplement 1). The mtDNA does not form a clade with the two other markhor sequences included in the phylogeny. Consistent with this is the position of Falconeri1 in IBS-based nj trees, which places the sample basal to all other non-outgroup samples, and does not group with other markhor samples ( Figure 1B, Figure 1-figure supplement 5). We conclude that the sample was misidentified during storage or sampling, and that the genetic sample labeled here as Falconeri1 is more likely to be a Barbary sheep. As such the sample was excluded from subsequent analyses.

Extended D/Direkli4-specific alleles
We extended the general idea of the D statistic (Green et al., 2010;Patterson et al., 2012) and group D statistic (Soraggi et al., 2018) to identify variants derived in a genome/population of interest, H3, and ancestral in a set of other defined genomes/populations and an outgroup (i.e. multiple 'outgroups', see Figure 3A). The number of times the derived allele (i.e. the H3 'specific' allele) was observed in a reference individual/genome H1 was counted (nBABA ex ), as was the number of times the derived allele was observed in the target individual/genome H2 (nABBA ex ). The difference between nABBA ex and nBABA ex was then divided by the sum of nABBA ex and nBABA ex , analogous to the D formula: A Z value can then be computed from bootstraps estimate of the standard error using 5 Mb genome blocks and 1000 bootstrap replicantes, to normalize the D ex value.
Sites must be covered at least once in each defined group ('outgroups' or otherwise). The ancestral allele was also conditioned on being at 100% frequency in each of the 'outgroup' groups, but conceptually this criteria could be slackened to investigate patterns of derived allele sharing (i.e. derived variants in H3 that are present at some frequency e.g. >0%,≤10% in one of the defined 'outgroups'). Calculations of D ex were based on the biallelic CpG-removed haploid-called sites (232,685,198 total), computing with and without transitions, and performed using a custom python script.
Initially, we examined the sharing of Direkli4-specific variants relative to a population of ~10,000 year old Neolithic domestic-like genomes from the Zagros Mountains (Daly et al., 2021), requiring the Direkli4-derived allele to be fixed for ancestral and covered at least once each in the following groups: • Direkli bezoar • Other bezoar (ancient and modern) • Other Capra • Sheep (using the 11 genomes defined in Supplementary file 1C) ultimately defining the ancestral state This would therefore identify variants specific to the Direkli4 lineage, and exclude those shared with the other Capra genomes analyzed here (e.g. Caucasian Tur) or the Direkli bezoar (which have likely experienced gene flow with the Direkli4 lineage, see Figure 2-figure supplement 4). D ex values were highly similar when computed using transversions only ( Figure 3B) or all variants (Figure 3-figure  supplement 1, for both see Supplementary file 1J). To control for reference genome effects we also calculated D ex requiring the Direkli4-derived allele to segregate in sheep, with highly correlated results (Pearson's r=0.9935, Supplementary file 1J).
We repeated the D ex calculation but varying H3 to be a different genome of interest (Figure 3figure supplements 2 and 3, Supplementary file 1M). Capra aegagrus from Direkli Cave (Direkli1-2, Direkli5, Direkli6) show highest levels of shared-specific ancestry, occurring in European and African goat; this ancestry is only somewhat correlated with Direkli4-specific ancestry (r=0.6562-0.5731) and likely reflects gene flow from Anatolian bezoar to the ancestors of west Eurasian domestic goat (e.g. Figure 3-figure supplements 5 and 7). A single markhor (Cfalconeri04, SAMN10736157) may have east Asian related gene flow in its ancestry, while derived alleles signals of multiple Capra genomes in sub-Saharan goat imply gene flow from a Capra source into these domestic populations.
We observed positive D ex values for Direkli4 and ancient west Eurasian goats, but not modern goats ( Figure 3B). To assess whether technical biases may artificially cause Direkli4 to share more alleles with ancient samples, we examined the correlation with D ex when other Capra genomes define the "specific" allele ( Figure 3-figure supplements 2 and 3, Supplementary file 1L). Correlations of specific allele sharing values with C. caucasica, the Palaeolithic Armenian bezoar Hovk1, and medieval C. cylindricornis Caucasus1 were highest (Pearson's r=0.9258, 0.9231, 0.8824 respectively) (Supplementary file 1L). The relatively high range of correlations with historic Capra genomes (r=0.7194-0.8363) suggests some technical bias, but does not completely explain the pattern of Direkli4-specific allele sharing.
As mentioned above this 'extended D' approach was amenable to allowing the H3-specific variant to also segregate at a defined rate among 'outgroups', effectively 'disentangling' patterns of shared derived alleles. We investigated Direkli4-specific allele sharing, cycling through different target H2 genomes, for variants also >0%,≤10% across 'outgroups' (but fixed as ancestral in sheep). For each target H2, the total number of times a given 'Outgroup' individual shared the 'Direkli4-specific' allele was recorded, expressing as a proportion of the total >0%,≤10% shared variants in a heatmap format (Figure 3-figure supplement 4). While sensitive to coverage (as deeper sequenced samples will on-average make up a greater proportion of the total number of shared 'Direkli4-specific' variants), differences are apparent between Direkli4-specific allele sharing between ancient and modern European goat; the former share more Direkli4 alleles with Direkli bezoar, while the latter share more Direkli4 alleles with the Caucasian tur genomes Tur1 and Caucasus1. This pattern could be explained by a turnover in European goat populations to one with a greater Caucasian tur-related ancestry, a technical bias that increases affinity between the Direkli bezoar and ancient European goat or Tur genomes and modern European goat, or gene flow between a population related to modern European goat and Caucasian tur. These hypotheses could be tested with finer temporal sampling of European goats, or a time series of Caucasian tur to determine if gene flow from domestic populations occurred.
To confirm our results were unlikely due to choice of reference genome, we recalculated a subset of D ex statistics with data aligned to sheep, with and without ascertaining in the sheep outgroup population and examining tests related to Direkli4-specific variants in domestic groups (Supplementary file 1J and K). Correlation with goat-aligned data was high (r=0.897 and 0.906 respectively using transversions-only), suggesting that a reference bias towards ARS1 was unlikely driving the observed patterns in Direkli4-sharing allele sharing. However, some Z scores differed by crossing the chosen significance threshold (| Z |>3) while retaining their D ex directionality. For example, with sheepascertained sites the modern European goats IOG and Italian4 obtain significant negative scores (having fewer Direkli4-specific variants than the reference Neolithic Zagros group) when using sheepaligned data but not goat aligned (negative but not |Z|>3, Supplementary file 1K). This is likely due to the lower number of genomes included in the D ex calculation when using sheep-aligned data, a necessity due to the computational limitations of aligning available data to the sheep genome. Each additional genome include in the H4 "outgroup" (e.g. diverse bezoar, Capra specimen, other Direkli bezoar) should reduce the total number of sites in the D ex calculation, by filtering out variants otherwise assigned as 'Direkli4-specific'. Fewer sites will reduce the sensitivity of the test (by decreasing the number of sites included in each bootstrap iteration) while increasing the specificity of the variants. As such it is the consistency of directionality and correlation of the D ex values between different data sets which should lend confidence to the goat-aligned results presented in Figure 3-figure supplements 1-3.

Graph-based modeling
To generate a graph-based admixture model of how individual genomes relate, Treemix (Pickrell and Pritchard, 2012) was employed on the 2X-sheep ascertained dataset. The number of migration events m was varied from 0 to 5, with other parameters set as -root Sheep -k 1000 --noss --global. Node support values were estimated using 50 bootstrap replicates and the -boot option, applying node support values to the base tree using RAxML (Stamatakis, 2014).
To maximize the possibility of detecting gene flow between the Direkli4/Tur lineage, Orientagraph (Molloy et al., 2021) was employed on a reduced set of populations and at a group level. Sites were filtered to retain those with at least one call per group and a MAF of ≥0.05, leaving 104,550 sites. m was varied from 0 to 4 due to computational limitations, running with Treemix settings as above but with the addition of -mlno to find the Maximum Likelihood Network Orientation, and k of 500 due to the lower SNP number.
Finally ADMIXTOOLS2 (Maier et al., 2022) was employed to explore admixture graph space in a complementary manner. To investigate the admixture status of the 'taurasian tur' lineage in a limited graph space, six populations/genomes were included: SNPs used were the 223,772 sheep-ascertained variants described above. We followed the approach suggested by Maier et al., 2022, fitting 50 graphs per complexity class (m=0-5) using find_ graphs(stop_gen = 100, outpop = 'Sheep') using pre-computed f 2 statistics calculated over all goat autosomes and allowing missingness (extract_f2(maxmiss = 1, auto_only = F)). At m=2 (two admixture edges), the preponderance of graphs fit with a worst f 4 outlier | Z |<3.
find_graph() was rerun at m=2 for 50 iterations, with the constraint that the Neolithic Serbian group had to receive at least one admixture edge in its history, reflecting the established gene flow event from a population related to the Direkli bezoar (Daly et al., 2018;Daly et al., 2021). Duplicate graphs were removed, as were graphs with 100%/0% admixture events. The best fitting graph was compared to the remaining graphs by calculating bootstrap values for graph scores (qpgraph_resa-mple_multi(nboot = 100) and compare_fits()$p_emp), with significantly worse fitting graphs removed; several as-good-as fitting graphs retained significant f 4 outlier |Z|>3.
The remaining 11 graphs were then scored for the presence of (1) admixture edges from ETa into DIR4, an indication that the taurasian tur group has bezoar admixture, and (2) admixture edges from DIR4 into ETa, indicating that Direkli bezoar have ancestry related to the taurasian tur. In a majority of graphs (6/11), ETa is modeled as having DIR4 ancestry (median 1.5% DIR4 ancestry, mean 5.2%). Only one graph models DIR4 as a mixture of an ETa related clade and Tur1 (2% for the latter). Following the methodology suggested by Maier et al., 2022, we examined the best fitting graph at an additional complexity class (m=3) and found ETa again to be modeled as containing DIR4 ancestry (1%), demonstrating this feature to be a robust one. These results suggest that while the data does not exclude bezoar to taurasian tur gene flow (which is implied by other results, including mtDNA lineages Figure 2-figure supplement 1), 'Direkli taurasian tur' to 'Direkli bezoar' admixture was of greater consequence, with more graphs indicating Direkli bezoar received taurasian tur admixture.
All 12 graphs 'as good as' the best fitting graph for m=2 (displayed in Figure 3-figure supplement 8) are available at https://osf.io/3ecqd/, along with the distribution of log likelihood scores for m=0-5 and the best fitting graph for m=3.

Data, script, and code availability
Raw sequencing reads, aligned QCed final bam files, and mitochondrial fasta files have been deposited in ENA under the project accession PRJEB51668. Admixture graphs 'as good as' the best fitting graph are available at https://osf.io/3ecqd/. Capra taurensis has been registered under the Zoobank LSID urn:lsid:zoobank.org:act:1261A42B-B0C0-4571-87F4-8EC3B5381A88. Scripts for extended D calculation/disentangling derived allele sharing are available on GitHub and https://osf.io/3ecqd/. to export samples from Direkli Cave provided by T.C. Kültür Varlıkları ve Müzeler Genel Müdürlüğü and T.C. Ankara Valılığı, Il Kültür ve Turizm Müdürlüğü, Anadolu Medeniyetleri Müzesi Müdürlüğü (#70583208-160.99(06)-899). We thank the VarGoats consortia for use of the modern tur sequencing data in IBS analyses. Preprint version 5 of this manuscript has been peer-reviewed and recommended by Peer Community In Genomics (https://doi.org/10.24072/pci.genomics.100020), and we thank all of those involved for their time and expertise. Zooarchaeological work at Direkli has been supported by grants from the Office of Vice Provost for Research, Baylor University and a URC grant from the Office of the Vice Chancellor for Research at the University of North Carolina at Chapel Hill (Benjamin S Arbuckle). This work was supported by the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant numbers 885729-AncestralWeave, 295729-CodeX, 295375-Persia and its Neighbours) (Kevin G Daly, Conor Rossi, Valeria Mattiangeli, Daniel G Bradley, Eberhard Sauer); and supported in part by a Grant from Science Foundation Ireland under grant number 21/PATH-S/9515 (Kevin G Daly). Additional files • Transparent reporting form
The following datasets were generated: